AITopics | reward probability

Collaborating Authors

reward probability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Classical Data to Quantum Advantage -- Quantum Policy Evaluation on Quantum Hardware

Hein, Daniel, Wiedemann, Simon, Baumann, Markus, Felbinger, Patrik, Klein, Justin, Schieder, Maximilian, Stein, Jonas, Schuman, Daniëlle, Cope, Thomas, Udluft, Steffen

arXiv.org Artificial IntelligenceSep-10-2025

IQM Germany Abstract--Quantum policy evaluation (QPE) is a reinforcement learning (RL) algorithm which is quadratically more efficient than an analogous classical Monte Carlo estimation. It makes use of a direct quantum mechanical realization of a finite Markov decision process, in which the agent and the environment are modeled by unitary operators and exchange states, actions, and rewards in superposition. Previously, the quantum environment has been implemented and parametrized manually for an illustrative benchmark using a quantum simulator . In this paper, we demonstrate how these environment parameters can be learned from a batch of classical observational data through quantum machine learning (QML) on quantum hardware. The learned quantum environment is then applied in QPE to also compute policy evaluations on quantum hardware. Our experiments reveal that, despite challenges such as noise and short coherence times, the integration of QML and QPE shows promising potential for achieving quantum advantage in RL.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/QCE65121.2025.10306

2509.07614

Country: Europe > Germany (0.48)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

An Empirical Evaluation of Thompson Sampling

Neural Information Processing SystemsMar-15-2024, 13:06:17 GMT

Thompson sampling is one of oldest heuristic to address the exploration / exploitation trade-off, but it is surprisingly unpopular in the literature. We present here some empirical results using Thompson sampling on simulated and real data, and show that it is highly competitive. And since this heuristic is very easy to implement, we argue that it should be part of the standard baselines to compare against.

algorithm, probability, thompson, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Santa Clara (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Marketing (0.47)

Technology:

Information Technology > Data Science > Data Mining (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting

Neural Information Processing SystemsMar-13-2024, 17:13:10 GMT

How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observations, is an important problem in cognitive science. We investigate this behavior in the context of a multi-armed bandit task. We compare human behavior to a variety of models that vary in their representational and computational complexity. Our result shows that subjects' choices, on a trial-totrial basis, are best captured by a "forgetful" Bayesian iterative learning model [21] in combination with a partially myopic decision policy known as Knowledge Gradient [7]. This model accounts for subjects' trial-by-trial choice better than a number of other previously proposed models, including optimal Bayesian learning and risk minimization, ε-greedy and win-stay-lose-shift. It has the added benefit of being closest in performance to the optimal Bayesian model than all the other heuristic models that have the same computational complexity (all are significantly less complex than the optimal model). These results constitute an advancement in the theoretical understanding of how humans negotiate the tension between exploration and exploitation in a noisy, imperfectly known environment.

artificial intelligence, bayesian inference, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report > New Finding (0.54)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Designing Optimal Behavioral Experiments Using Machine Learning

Valentin, Simon, Kleinegesse, Steven, Bramley, Neil R., Seriès, Peggy, Gutmann, Michael U., Lucas, Christopher G.

arXiv.org Artificial IntelligenceNov-26-2023

Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely, and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code and tutorial notebooks to replicate all analyses.

experiment, experimental design, posterior distribution, (16 more...)

arXiv.org Artificial Intelligence

2305.07721

Country:

North America > United States > New York (0.04)
North America > United States > Montana (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)
Instructional Material > Course Syllabus & Notes (0.86)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Theory of Acceleration of Decision Making by Correlated Time Sequences

Okada, Norihiro, Yamagami, Tomoki, Chauvet, Nicolas, Ito, Yusuke, Hasegawa, Mikio, Naruse, Makoto

arXiv.org Artificial IntelligenceJul-15-2022

Photonic accelerators have been intensively studied to provide enhanced information processing capability to benefit from the unique attributes of physical processes. Recently, it has been reported that chaotically oscillating ultrafast time series from a laser, called laser chaos, provide the ability to solve multi-armed bandit (MAB) problems or decision-making problems at GHz order. Furthermore, it has been confirmed that the negatively correlated time-domain structure of laser chaos contributes to the acceleration of decision-making. However, the underlying mechanism of why decision-making is accelerated by correlated time series is unknown. In this study, we demonstrate a theoretical model to account for accelerating decision-making by correlated time sequence. We first confirm the effectiveness of the negative autocorrelation inherent in time series for solving two-armed bandit problems using Fourier transform surrogate methods. We propose a theoretical model that concerns the correlated time series subjected to the decision-making system and the internal status of the system therein in a unified manner, inspired by correlated random walks. We demonstrate that the performance derived analytically by the theory agrees well with the numerical simulations, which confirms the validity of the proposed model and leads to optimal system design. The present study paves the way for improving the effectiveness of correlated time series for decision-making, impacting artificial intelligence and other applications.

probability, threshold level, time sery, (15 more...)

arXiv.org Artificial Intelligence

2203.16004

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Contextual Bandits Evolving Over Finite Time

Deshpande, Harsh, Jain, Vishal, Moharir, Sharayu

arXiv.org Machine LearningNov-14-2019

--Contextual bandits have the same exploration-exploitation tradeoff as standard multi-armed bandits. On adding positive externalities that decay with time, this problem becomes much more difficult as wrong decisions at the start are hard to recover from. We explore existing policies in this setting and highlight their biases towards the inherent reward matrix. We propose a rejection based policy that achieves a low regret irrespective of the structure of the reward probability matrix. In the context of restaurant recommendation systems, users can generally be classified into multiple user types with different preferences for different restaurants.

probability, restaurant, reward probability, (11 more...)

arXiv.org Machine Learning

1911.05956

Country: Asia > India > Maharashtra > Mumbai (0.05)

Genre: Research Report (0.40)

Industry: Consumer Products & Services > Restaurants (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.54)
Information Technology > Data Science > Data Mining > Big Data (0.35)

Add feedback

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

Tamatsukuri, Akihiro, Takahashi, Tatsuji

arXiv.org Artificial IntelligenceDec-14-2018

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a \textit{satisficing} strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing ($RS$) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the $K$-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that $RS$ is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of $RS$ is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of $RS$ with that of other representative algorithms for the $K$-armed bandit problems.

artificial intelligence, health & medicine, probability, (20 more...)

arXiv.org Artificial Intelligence

1812.05795

Country: Asia > Japan > Honshū > Kantō (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.46)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Been There, Done That: Meta-Learning with Episodic Recall

Ritter, Samuel, Wang, Jane X., Kurth-Nelson, Zeb, Jayakumar, Siddhant M., Blundell, Charles, Pascanu, Razvan, Botvinick, Matthew

arXiv.org Artificial IntelligenceMay-24-2018

Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur - as they do in natural environments - metalearning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious environments, then develop a meta-learning architecture for solving these environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of agents with this episodic LSTM in five meta-learning environments with reoccurring tasks, ranging from bandits to navigation and stochastic sequential decision problems.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1805.09692

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Category Theoretic Analysis of Photon-based Decision Making

Naruse, Makoto, Kim, Song-Ju, Aono, Masashi, Berthel, Martin, Drezet, Aurélien, Huant, Serge, Hori, Hirokazu

arXiv.org Artificial IntelligenceMay-9-2018

Decision making is a vital function in this age of machine learning and artificial intelligence, yet its physical realization and theoretical fundamentals are still not completely understood. In our former study, we demonstrated that single-photons can be used to make decisions in uncertain, dynamically changing environments. The two-armed bandit problem was successfully solved using the dual probabilistic and particle attributes of single photons. In this study, we present a category theoretic modeling and analysis of single-photon-based decision making, including a quantitative analysis that is in agreement with the experimental results. A category theoretic model reveals the complex interdependencies of subject matter entities in a simplified manner, even in dynamically changing environments. In particular, the octahedral and braid structures in triangulated categories provide a better understanding and quantitative metrics of the underlying mechanisms of a single-photon decision maker. This study provides both insight and a foundation for analyzing more complex and uncertain problems, to further machine learning and artificial intelligence.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1602.08199

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.88)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

The Importance of Constraint Smoothness for Parameter Estimation in Computational Cognitive Modeling

Nunes, Abraham, Rudiuk, Alexander

arXiv.org Machine LearningMar-23-2018

Psychiatric neuroscience is increasingly aware of the need to define psychopathology in terms of abnormal neural computation. The central tool in this endeavour is the fitting of computational models to behavioural data. The most prominent example of this procedure is fitting reinforcement learning (RL) models to decision-making data collected from mentally ill and healthy subject populations. These models are generative models of the decision-making data themselves, and the parameters we seek to infer can be psychologically and neurobiologically meaningful. Currently, the gold standard approach to this inference procedure involves Monte-Carlo sampling, which is robust but computationally intensive---rendering additional procedures, such as cross-validation, impractical. Searching for point estimates of model parameters using optimization procedures remains a popular and interesting option. On a novel testbed simulating parameter estimation from a common RL task, we investigated the effects of smooth vs. boundary constraints on parameter estimation using interior point and deterministic direct search algorithms for optimization. Ultimately, we show that the use of boundary constraints can lead to substantial truncation effects. Our results discourage the use of boundary constraints for these applications.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

1803.09018

Country: North America (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback